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ABSTRACT 

Objective: To determine the diagnostic accuracy of 
tuning fork tests for detecting fractures. 
Design: Systematic review of primary studies 
evaluating the diagnostic accuracy of tuning fork tests 
for the presence of fracture. 
Data source: We searched MEDLINE, CINAHL, AMED, 
EMBASE, Sports Discus, CAB Abstracts and Web of 
Science from commencement to November 2012. We 
manually searched the reference lists of any review 
papers and any identified relevant studies. 
Study selection and data extraction: Two 
reviewers independently reviewed the list of potentially 
eligible studies and rated the studies for quality using 
the QUADAS-2 tool. Data were extracted to form 2x2 
contingency tables. The primary outcome measure was 
the accuracy of the test as measured by its sensitivity 
and specificity with 95% CIs. 
Data synthesis: We included six studies (329 
patients), with two types of tuning fork tests (pain 
induction and loss of sound transmission). The studies 
included patients with an age range 7-60 years. The 
prevalence of fracture ranged from 10% to 80%. The 
sensitivity of the tuning fork tests was high, ranging 
from 75% to 100%. The specificity of the tests was 
highly heterogeneous, ranging from 18% to 95%. 
Conclusions: Based on the studies in this review, 
tuning fork tests have some value in ruling out 
fractures, but are not sufficiently reliable or accurate for 
widespread clinical use. The small sample size of the 
studies and the observed heterogeneity make 
generalisable conclusion difficult. 
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INTRODUCTION 

Although imaging for suspected fractures is 
generally cheap and readily accessible, there 
are situations such as remote settings, where 
imaging is not readily available. Other clin- 
ical tests for fracture may then assist in deci- 
sion making. One test which was proposed at 
least 60 years ago is the use of a tuning fork. 1 
Two methods of using tuning forks to 
detect fracture (s) have been developed. The 
first method uses a vibrating tuning fork 
placed directly over, or closely proximal to 
the suspected fracture site. Because the 



Strength and limitations of this study 



Based on the studies in this review, tuning fork 
tests have value in ruling out some fractures, but 
current evidence is insufficient to state the cir- 
cumstances when it is reliable. 
Quantification of the degree and causes of het- 
erogeneity of the studies was not feasible, 
because of small sample size and varying 
methods of the studies. 
Therefore, this review does not support the 
current clinical use of tuning forks as a triage 
test for the diagnosis of fractures. 



periosteum is heavily innervated, mechanical 
vibration over a fracture site stimulates the 
overlying periosteum, causing pain. 2 The 
pain stops or decreases with the removal of 
the tuning fork. The second method uses a 
vibrating tuning fork placed over a bony 
prominence distal to the fracture site. Using 
a stethoscope to listen to the sound over a 
bony prominence proximal to the fracture 
site, the fracture is detected by a reduction 
in the sound conducted along the bone com- 
pared to the unaffected limb. 1 

The aim of this review was to identify the 
techniques used to diagnose fractures using 
a tuning fork and assess all studies of the 
diagnostic accuracy of tuning fork tests for 
the presence of fracture. 

METHODS 

The inclusion criteria for the review were 
primary studies that assessed the diagnostic 
accuracy of tuning forks, using either pain or 
reduction of sound as the index test, mea- 
sured against a recognised reference stand- 
ard, such as X-ray, MRI or bone scan for the 
diagnosis of fractures. We included studies 
that enrolled patients of all ages and in all 
clinical settings with no exclusion by the lan- 
guage of publication. We excluded case 
series, case-control studies and narrative 
review papers. 
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authors (KM and JD) and full manuscripts for all poten- 
tial relevant papers were obtained. Two review authors 
(KM and JD) independently reviewed each paper for 
inclusion according to the predefined inclusion criteria, 
rated the study quality and then extracted relevant data. 
In the case of duplicate publication, we selected the most 
complete version of the study. We resolved disagreements 
through discussion with the third author (PG) . 

The primary outcome measure of interest was the 
accuracy of the test as measured by its sensitivity and spe- 
cificity. Wherever possible, we used the raw data to con- 
struct 2x2 tables. 95% CIs for sensitivity and specificity 
were calculated with the Wilson score method and 95% 
CIs for positive and negative likelihood ratios were calcu- 
lated with the method described by Simel et al? 4 We 
appraised each article using the QUADAS-2 tool. 

RESULTS 

Literature identification and study quality 

We identified 62 citations from the electronic and bib- 
liographic searches. Sixteen articles in full text were 
obtained for further scrutiny. Six primary studies (329 
patients) were included in the final review (figure 1). 

The characteristics of the participants and the 
methods of testing are shown in table 1. Most studies 
included only adults; one study included paediatric 



Citations identified from electronic and bibliographic searches after removing duplicates (n=62) 



Studies excluded on the basis of title 
and/or abstract (n=46) 



Relevant articles retrieved in full text (n=16) 



Review paper (n=l) 
Case report (n=l) 
Description of technique (n=l) 
Earlier version of included study (n=l) 
Commentaries on included studies (n=3) 
Inappropriate reference test (n=2) 
Different method of Index test (n=l) 



Studies included in review (n=6) 



Figure 1 Flow chart of studies included in the review. 
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Search strategy 

We searched MEDLINE, CINAHL, AMED, EMBASE, 
Sports Discus, CAB Abstracts and Web of Science from 
commencement to November 2012. We also searched 
the reference lists of any identified studies or review 
papers. We also searched for any systematic reviews or 
meta-analyses carried out on this diagnostic test. 

The Medline search strategy is shown in box 1, and 
was run without a methodological filter. 



Data extraction and management 

We selected studies in a two-stage process. The titles and 
abstracts of all search results were screened by two 
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Table 1 Characteristics of the included studies 
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patients. The prevalence of fracture ranged from 10% to 
80%. Two studies used the tuning fork test to investigate 
any suspected fracture, 2 6 one suspected femoral neck 
fracture, 7 one ankle inversion injury 8 and two stress frac- 
tures. 9 10 The studies investigating any fracture, femoral 
or ankle fractures used X-ray as a reference standard 
and the studies of stress fractures used either bone scan 
or X-ray and bone scan as a reference standard. The 
study of patients with ankle inversion injuries included 
patients who had tested positive to the 'Ottawa ankle 
rule'. 

Four studies detected fractures using pain induced by 
the vibrating tuning fork, 2 8_10 while two studies used 
reduced sound conduction. 7 6 Four studies used a 
128 Hz tuning fork alone, 6-9 but two studies compared 
the diagnostic accuracy of different frequency tuning 
forks within the studies. 2 10 

The methodological quality of the included studies 
was modest, with important elements that may indicate a 



risk of bias being unclear or not reported. For example, 
in most studies it was either unclear or not stated 
whether the comparison between the tuning fork test 
and the reference test had been blind and independent 
of the reference standard (table 2). 

Figure 2 shows sensitivity versus 1-specificity (receiver 
operating characteristic plot) for the six included 
studies. The sensitivity of the tuning fork tests was gener- 
ally high, ranging from 75% to 100%. In the study to 
rule out fracture in patients who had tested positive to 
the 'Ottawa ankle rule', the use of the tuning fork on 
either the tip of the lateral malleolus or the distal fibula 
shaft gave a sensitivity of 100%, albeit there were only 
five patients with fractures. 8 However, the specificity of 
the test in the six studies was highly heterogeneous, 
ranging from 18% to 95%. 

Two studies showed reasonable overall diagnostic 
accuracy with diagnostic ORs >10, but other studies 
showed only modest values (table 3). The two studies 



Table 2 Methodological quality of the included studies 
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Figure 2 Sensitivity versus 1 -specificity (receiver operating 
characteristic) plot of included studies. 



that compared the diagnostic accuracy of different fre- 
quency tuning forks on the same patients found no dif- 
ferences between frequencies. 2 10 One study assessed the 
differences between pain ratings but differences were 
small. The study that assessed inter-tester reliability 



showed only low reliability. 



DISCUSSION 

Two forms of tuning fork test, one based on pain 
induction and the other on sound transmission, 
showed modest diagnostic accuracy with some ability 
to rule out fractures. However, the estimated sensitiv- 
ity (ranging from 75% to 100%) is not sufficient to be 
relied on to rule out fractures based on a negative 
test. The specificity is particularly heterogeneous, 
potentially resulting in a high proportion of false- 
positive test results. The reasons for this variation in 
accuracy are unclear, but may be related to both the 
way the test is performed or to characteristics of the 
injuries and fractures. 

The low inter-tester reliability suggests that the techni- 
ques would benefit from standardisation and training. 
Wilder et al 10 compared different frequencies and found 
a higher induction of fracture pain using 256 Hz, but 
pain also occurred in patients without fractures resulting 
in a low specificity. 

Based on the results in this review, the tuning fork test 
was less accurate for stress fractures than other types of 
fractures, but a number of features of this type of injury 
may modify the accuracy. Lesho 9 suggests that in the 
early stages, stress fractures might not be identified by 
the tuning fork test, because the bone shell is still more 
or less intact. A bone scan, however, would show an 
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increased activity in the fractured area. Timing may also 
affect the accuracy of the test. 

A mineralised callus where fracture healing has been 
initiated might not be identified by these tests. It is 
unclear whether a discontinuity of the cortical bone is 
required in order to give a positive test result. Both types 
of tuning fork tests seem to be more accurate in diagnos- 
ing transverse fractures than other types of fractures. It 
is also unclear whether swelling or bruising in the area 
of the injury might affect the results. 

A systematic review, 11 which examined a variety of 
methods for the diagnosis of stress fractures, included 
only two of the six studies we used in this review. 

In conclusion, both tuning fork methods have some 
discrimination ability, but current techniques are not suf- 
ficiently reliable or accurate to rule in or out fractures 
and currently should have only limited use in clinical 
practice. The small sample size of the studies and the 
observed heterogeneity make generalisable conclusion 
difficult. However, the clinical usefulness of these tests 
might be in remote areas or athletic fields with no easy 
access to other options. 
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